Despite being the appearance-based classifier of choice in recent years,relatively few works have examined how much convolutional neural networks(CNNs) can improve performance on accepted expression recognition benchmarksand, more importantly, examine what it is they actually learn. In this work,not only do we show that CNNs can achieve strong performance, but we alsointroduce an approach to decipher which portions of the face influence theCNN's predictions. First, we train a zero-bias CNN on facial expression dataand achieve, to our knowledge, state-of-the-art performance on two expressionrecognition benchmarks: the extended Cohn-Kanade (CK+) dataset and the TorontoFace Dataset (TFD). We then qualitatively analyze the network by visualizingthe spatial patterns that maximally excite different neurons in theconvolutional layers and show how they resemble Facial Action Units (FAUs).Finally, we use the FAU labels provided in the CK+ dataset to verify that theFAUs observed in our filter visualizations indeed align with the subject'sfacial movements.
展开▼